home *** CD-ROM | disk | FTP | other *** search
- ---PRIMER.DOC---
-
- What is A86?
-
- A86 is the name of Eric Isaacson's assembly language for the Intel 86-family
- of (IBM-PC and compatible) computers. Statements written in this language are
- used to specify machine instructions for the 86-family and to allocate memory
- space for program data. These human-readable statements are translated into a
- machine-readable form by a program called the A86 assembler. The input to the
- assembler is a source file (or a list of source files) containing assembly
- language statements. The output of the assembler is a file containing binary
- program code that can be run as a program on the PC.
-
-
- Elements of the A86 Assembly Language
-
- The statements in an A86 source file can be classified in three general
- categories: instruction statements, data allocation statements, and assembler
- directives. An instruction statement uses an easily-remembered name (a
- mnemonic) and possibly one or more operands to specify a machine instruction to
- be generated. A data allocation statement reserves, and optionally
- initializes, memory space for program data. An assembler directive is a
- statement that gives special instructions to the assembler. Directives are
- unlike the instruction and data allocation statements in that they do not
- specify the actual contents of memory. Examples of the three types of A86
- statements are given below. These are provided to give you a general idea of
- what the different kinds of statements look like.
-
-
- Instruction Statements
-
- MOV AX,BX
- CALL SORT_PROCEDURE
- ADD AL,7
-
-
- Data Allocation Statements
-
- A_VARIABLE DW 0
- DB 'HELLO'
-
-
- Assembler Directives
-
- CODE SEGMENT
- ITEM_COUNT EQU 5
-
-
- The statements in an A86 source file are made up of keywords, identifiers,
- numbers, strings, special characters, and comments. A keyword is a symbol that
- has special meaning to the assembler, such as an instruction mnemonic (MOV,
- CALL) or some other reserved word in the assembly language (DB, SEGMENT, EQU).
- Identifiers are programmer-defined symbols, used to represent such things as
- variables, labels in the code, and numerical constants. Identifiers may contain
- letters, numbers, and the characters _, @, $, and ?, but must begin with a
- letter, _, or @. The identifier name is considered unique up to 31 characters,
- but it can be of any length (up to 255 characters). Examples of identifiers
- are: COUNT, L1, and A_BYTE.
-
- Numbers in A86 may be expressed as decimal, hexadecimal, octal, or binary.
- These must begin with a decimal digit and, except in the case of a decimal or
- hexadecimal number, must end with "x" followed by a letter identifying the base
- of the number. A number without an identifying base is hexadecimal if the first
- digit is 0; decimal if the first digit is 1 through 9. Examples of A86 numbers
- are: 123 (decimal), 0ABC (hexadecimal), 1776xQ (octal), and 10100110xB (binary).
-
- Strings are characters enclosed in single-quotes. Examples of strings are: '1st
- string' and 'SIGN-ON MESSAGE, V1.0'. The single-quote is one of many special
- characters used in the assembly language. Others, run together in a list, are:
- $?;:=,[].+-()*/>. The space and tab characters are also special characters,
- used as separators in the assembly language.
-
- A comment is a sequence of characters used for program documentation only; it is
- ignored by the assembler. Comments begin with a semicolon (;) and run to the
- end of the line on which they are started. Examples of lines with comments are
- shown below:
-
- ; This entire line is a comment.
- MOV AX,BX ; This is a comment next to an instruction statement.
-
- Alternatively, for compatibility with IBM's assembler, I provide the COMMENT
- directive. The next non-blank character after COMMENT is a delimiter to a
- comment that can run across many lines; all text is ignored, until a second
- instance of the demiliter is seen. For example,
-
- COMMENT 'This comment
- runs across two lines'
-
- I don't like COMMENT, becuase I think it's very dangerous. If, for example,
- you have two COMMENTs in your program, and you forget to close the first one,
- the assembler will happily ignore all source code between the comments. If
- that source code does not happen to contain any labels referenced elsewhere,
- the error may not be detected until your program blows up. For multiline
- comments, I urge you to simply start each line with a semicolon.
-
-
- Statements in the A86 are line-oriented, which means that statements may not be
- broken across line boundaries. A86 source lines may be entered in a free-form
- fashion; that is, without regard to the column-orientation of the symbols and
- special characters.
-
- PLEASE NOTE: Because an A86 line is free-formatted, there is no need for you to
- put the operands to your instructions in a separate column! The only reason
- that 99% of the assembly-language programs out there in the world have opernads
- in a separate column is that some IBM assembler written back in 1953 required
- it. It makes no sense to have operands in a spearate column, so STOP DOING IT!
-
-
- Opreand Typing and Code Generation
-
- A86 is a strongly typed assembly language. What this means is that operands to
- instructions (registers, variables, labels, constants) have a type attribute
- associated with them which tells the assembler something about them. For
- example, the operand 4 has type number, which tells the assembler that it is a
- numerical constant, rather than a register or an address in the code or data.
- The following discussion explains the types associated with instruction operands
- and how this type information is used to generate particular machine opcodes
- from general purpose instruction mnemonics.
-
- Registers
-
- The 8086 has 8 general-purpose word registers: AX,BX,CX,DX,SI,DI,BP, and SP.
- The first four of those registers are subdivided into 8 general-purpose byte
- registers AH,AL,BH,BL,CH,CL,DH, and DL. There are also 4 16-bit segment
- registers CS,DS,ES, and SS, used for addressing memory; and the implicit
- instruction-pointer register.
-
- Variables
-
- A variable is a unit of program data with a symbolic name, residing at a
- specific location in 8086 memory. A variable is given a type at the time it is
- defined, which indicates the number of bytes associated with its symbol.
- Variables defined with a DB statement are given type BYTE (one byte), and those
- defined with the DW statement are given type WORD (two bytes). Examples:
-
- BYTE_VAR DB 0 ; A byte variable.
- WORD_VAR DW 0 ; A word variable.
-
-
- Labels
-
- A label is a symbol referring to a location in the program code. It is defined
- as an identifier, followed by a colon (:), used to represent the location of a
- particular instruction or data structure. Such a label may be on a line by
- itself or it may immediately precede an instruction statement (on the same
- line). In the following example, LABEL_1 and LABEL_2 are both labels for the
- MOV AL,BL instruction.
-
- LABEL_1:
- LABEL_2: MOV AL,BL
-
- In the A86 assembly language, labels have a type identical to that of
- constants. Thus, the instruction MOV BX,LABEL_2 is accepted, and the code
- to move the immediate constant address of LABEL2 into BX, is generated.
-
- IMPORTANT: you must understand the distinction between a label and a variable,
- because you may generate a different instruction than you intended if you
- confuse them. For example, if you declare X: DW ?, the colon following the X
- means that X is a label; the instruction MOV SI,X moves the immediate constant
- address of X into the SI register. On the other hand, if you declare X DW ?,
- with no colon, then X is a word variable; the same instruction MOV SI,X now does
- something different: it loads the run-time value of the memory word X into the
- SI register.
-
-
- Constants
-
- A constant is a numerical value computed from an assembly-time expression. For
- example, 123 and 3 + 2 - 1 both represent constants. A constant differs from an
- a variable in that it specifies a pure number, known by the assembler before the
- program is run, rather than a number fetched from memory when the program is
- running.
-
-
- Generating Opcodes from General Purpose Mnemonics
-
- My A86 assembly language is modeled after Intel's 8086 language, which uses
- general purpose mnemonics to represent classes of machine instructions rather
- than having a different mnemonic for each opcode. For example, the MOV mnemonic
- is used for all of the following: move byte register to byte register, load word
- register from memory, load byte register with constant, move word register to
- memory, move immediate value to word register, move immediate value to memory,
- etc. This feature saves you from having to distinguish "move" from "load,"
- "move constant" from "move memory," "move byte" from "move word," etc.
-
- Because the same general purpose mnemonic can apply to several different machine
- opcodes, A86 uses the type information associated with an instruction's
- operands in determining the particular opcode to produce. The type information
- associated with instruction operands is also used to discover programmer errors,
- such as attempting to move a word register to a byte register.
-
- The examples that follow illustrate the use of operand types in generating
- machine opcodes and discovering programmer errors. In each of the examples, the
- MOV instruction produces a different 8086 opcode, or an error. The symbols used
- in the examples are assumed to be defined as follows: BVAR is a byte variable,
- WVAR is a word variable, and LAB is a label. As you examine these MOV
- instructions, notice that, in each case, the operand on the right is considered
- to be the source and the operand on the left is the destination. This is a
- general rule that applies to all two-operand instruction statements.
-
- MOV AX,BX ; (8B) Move word register to word register.
- MOV AX,BL ; ERROR: Type conflict (word,byte).
- MOV CX,5 ; (B9) Move constant to word register.
- MOV BVAR,AL ; (A0) Move AL-register to byte in memory.
- MOV AL,WVAR ; ERROR: Type conflict (byte,word).
- MOV LAB,5 ; ERROR: Can't use a label/constant as destination to MOV.
- MOV WVAR,SI ; (89) Move word register to word in memory.
- MOV BL,1024 ; ERROR: Constant is too large to fit in a byte.
-